%Background

In this section, we overview other forms of heterogeneity, provide
a basic introduction to the benefits and limitations of the
steep-slope devices we consider, and describe how we extrapolate from
these effects to processor-level models.

\subsection{Heterogeneous architectures}
There have been many prior works that deal with heterogeneity at
various levels of abstraction, ranging from the
technology to the architecture-level
and techniques to map applications on such platforms.
The technology-level aspects of heterogeneous integration of CMOS with
emerging technologies using 3D technology have been explored
in~\cite{ionescu-3D}.  While static and dynamic techniques for optimizing ILP and TLP in
architecturally heterogeneous multicore have been examined before
in~\cite{morphcore}, thermal-aware application scheduling
in~\cite{prometheus} and voltage scaling techniques in 3D
architectures in~\cite{hhlee-3D-dvfs}, our work extends such
techniques to device heterogeneity as well.

\subsection{Steep Slope devices}
\emph{Steep slope devices} have been proposed as an alternative to
counter the 60~mv/decade subthreshold limitation that restricts the
scaling of conventional CMOS transistors. This property of CMOS causes
an exponential increase in leakage current as supply voltage
($V_{dd}$) scales to near- and sub-threshold values and threshold
voltage ($V_T$) remains roughly constant.  In contrast, steep-slope
devices do not suffer from this sub-threshold limitation on account of
using a different charge transport mechanism which involves tunneling
through the intrinsic region.  Hence, at near-threshold and
subthreshold voltages, these steep slope devices have the potential to
outperform CMOS devices to several orders of magnitude.

Heterojunction Tunnel FETs are one of the most promising steep slope
devices, both in terms of subthreshold operation and high speed
switching operations, and they have been shown to scale well into
future process nodes~\cite{Lu-tfet-scaling}. A sufficent number of
logic and memory circuits have been designed useing TFETs to populate
standard cell libraries allowing synthesis models for larger TFET
designs.  Experimental demonstrations for III-V n-FET and SiGe p-FET
co-integration in~\cite{czornomaz-iedm13} shows heterogeneous
integration of TFET and CMOS devices is even possible on the same die,
and integration via 3D stacking is no more difficult than CMOS-on-CMOS
stacking.

TFETs do however, suffer in some aspects when compared to CMOS
technology. As the supply voltage is increased, the inherent
limitation in the TFET charge-carrying mechanism causes the current to
saturate above a certain operating voltage.  Due to the saturation of
the tunneling current, the switching delay remains constant beyond
that voltage. At a processor level, this translates to a more limited range
of practical operating frequencies for TFET-based processors.


\subsection{Extrapolation to Processor Model}
The device models used for simulating TFET and FinFET characteristics
are described in detail in~\cite{karthik-isca}.
%These models were incorporated into a full system architectural simulator like GEMS~\cite{gems} and calibrated to match existing processor designs from the Ultra-SPARC family of processors.
Validation of these models was done with 
Fabscalar~\cite{fabscalar}, which generates synthesizable HDL code for
different micro-architectural configurations. Synthesizing 
Fabscalar cores of different issue widths enabled us to determine
critical path delay and the core power and match it to those
obtained from our models.

Figure ~\ref{fig:crossover} shows the variation in total core power
with frequency for the Si FinFET and both the Low Operating Power
(LOP) and Low Leakage (LSTP) TFET Models.  The crossover frequency
$F_{c}$ is defined as the frequency below which TFET processor
operation is more energy efficient than that the CMOS FinFET based
processor.  The lower leakage energy of the LSTP HTFET results in more
efficient operation compared to the LOP TFET device. Incorporating
wire components to the existing processor model results in a further
increase in the crossover frequency, as the wire delays have limited
sensitivity to the change in transistor type.

%Works such as~\cite{iedm13-rooyackers} and~\cite{iedm13-tomioka} have
%demonstrated heterogeneous integration of Si-FinFET and III-V devices,
%which can make it possible to manufacture CMOS and TFET cores on a
%single layer. However, the differing operating voltages preferred by
%the two devices demand a coarse-grained integration, and we do not
%consider device heterogeneity below the granularity of a core or a
%cache. Hence, our design space covers heterogeneous system with both
%CMOS and TFET cores and CMOS last-level caches.

\begin{figure}[ht!]
\centering
\epsfig{file=figs/crossover.eps, angle=0, width=1\linewidth, clip=}
\caption{\label{fig:crossover} Comparison of Power-frequency characteristics of Si FinFET, LOP TFET and Low Leakage TFET based processors}
\vspace{-0.2in}
\end{figure}
